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Isolated nucleic acid molecule encoding a 

HUMAN SEMAPHORED MOLECULE, AND USES THEREOF 

RELATE D APP1AC.A TTONg 

This appUcation is a continuation in part of Serial No. 09/483,618, filed January 14, 2000, 
which is a continuation-in-part of Serial No. 09/406,117, filed September 27, 1999, which 
is a continuation in part of Serial No. AppUcation Serial Number 09/196,716, filed on 
November 20, 1998, the disclosure of which is incorporated by reference in its entirety. It 
is also a continuation-in-part of PCT appUcation PCT/US99/27430, filed November 19, 
1999, designating the United States. 

FIEI.D OE THE INVENTION 

Theinventionrelatesto isolated nucleic acidmolecules which encodehuman analogs 
• of semaphorin, the proteins encoded thereby, as weU as their use. The molecules described 
herein were isolated and identified using the ORESTES method, which is described herein. 

BACKGR OUND AND PRIOR ART 

The area of nucleic acid research has seen tremendous advances in knowledge and 
understanding in the recent past. One of the goals in the field has been the determination of 
the sequence of the entire chromosomal component, or "genome" of organisms. This has 
been achieved for several non-nucleated organisms (prokaryotes), and of one organism with 
anucleus, a "eukaryote". Eukaryotes have much more complex genomes than prokaryotes, 
for reasons which will be discussed infra. 

The interest in sequencing entire genomes of organisms has been explained in detail 
in both technical and non-technical pubhcations, and need not be repeated here. See, for 
example Venter, et al, "Shotgun Sequencing of The Human Genome", Science 280:1640 - 
1642 (1998), Pennisi, "A Planned Boost for Genome Sequencing, But the Plan Is in Flux", 
Science 281: 148-149 (1998). 
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Various approaches to what is a large, and complex project have been advanced For 
example, the so-called "Shotgun 5 ' approach, developed by Venter et al, is very well known. 
In this approach, genomic DNA is cleaved into very small pieces, and these pieces are then 
sequenced The approach is repeated, and after an undefined number of repeats, sequences 
are aligned to permit, at least in theory, a determination of the complete genomic sequence. 

This approach has been used by Venter et al on prokaryotes, and it has been proposed 
for use on more complex eukaryotes, such as humans. The proposed approach to eukaryotes 
is not without drawbacks and criticism, however. A sizable portion of the scientific 
community is of the view that the resulting information will be riddled with gaps. The 
human genome, in contrast to prokaryotic genomes is characterized by a large number of 
repetitive sequences. It is felt by many that the overlapping of repetitive sequences could 
lead to incorrect alignment of the larger fragments from which they are derived. 

A second approach, which has found more widespread acceptance, is to cleave the 
genome into relatively large fragments, and then to "map" the larger, non-sequenced 
fragments to show overlap prior to sequencing the material. After this overlapping, which 
results in a physical map of the genome, the segments are fragmented, and sequenced. While 
this approach should, in theory, eliminate the gaps in the sequence, it is time consuming and 
costly. Further, both of these approaches suffer from a fundamental drawback, as will all 
approaches which begin with eukaryotic genomic DNA, as will now be explained. 

Eukaryotic DNA consists of both "coding" and **non-coding" DNA. For purposes 
of this invention, only coding DNA is under consideration, as it is this material which is 
transcribed and then translated into proteins. This coding DNA is sometimes referred to as 
"open reading frames" or "ORFs", and this terminology will be used hereafter. 

As compared to prokaryotes, eukaryotic DNA has a much more complex structure. 
Genes generally consist of a non-coding, regulatory portion of hundreds of nucleotides 
followed by coding regions ("exons"), separated by non-coding regions ("introns"). When 
DNA is transcribed into messenger RNA, or mRNA, and then translated into protein, it is 
only these exons which are of interest. It has been estimated that, for humans, of the 
approximately 3 billion nucleotides which make up the genome, only about 3% are coding 
sequences. The shotgun and mapping approaches referred to supra do not differentiate 
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between coding and non-coding regions. Hence, a method which would permit sequencing 
of only coding regions would be of great interest, especially if the method permits 
development of longer "contigs" of sequence information. 

One such method is, in fact known. This is the "Expressed Sequence Tag" or "EST" 
approach. In this approach, one works with complementary DNA or "cDNA" ratibter than 
genomic DNA. In brief, as indicated supra, genomic DNA is transcribed into mRNA. The 
mRNA contains the relevant ORF in contiguous form, i.e. without intervening introns. These 
molecules are very fragile and their existence transient In the laboratory, one can employ 
various enzymes, i.e., so-called "reverse transcriptases" to prepare complementary DNA, or 
"cDNA", which is much more stable than mRNA. One then sequences the cDNA, 
incompletely, from either the 5' or 3 1 end. These incomplete sequences, in theory, serve as 
identifying "tags" for nucleic acid molecules of interest. Literally millions of ESTs have 
been prepared, and are accessible via known data bases, such as GeriBank. 

There are problems with this approach as well. First, large amounts of extremely high 
quality mRNA are necessary, and this is not always available. Also, one must bear in mind 
that the non-coding regions of mRNA molecules are found at the 5 ! and 3 ! ends, and this is 
carried over into the cDNA molecule. As a result, the information obtained may not be very 
useful. For example, it frequently provides no information about the actual protein encoded 
by the molecule. Clearly, there is a need for a system which provides more useftd 
information about nucleic acid molecules. 

Dias Neto et al., Gene 186: 135-142 (1997), the disclosure of which is incorporated 
by reference, applied a method for determining sequence information from the parasite S. 
Tnansoni which involved, inter alia, the use of arbitrary primers, and low stringency 
hybridization conditions. There is no discussion in this paper of the ability to identify and 
to sequence internal portions of an open reading frame. The paper itself appears to have only 
been cited a single time by other investigators. Nor is there any discussion within the 
reference of investigating sequences for overlap, so as to develop "contigs", i.e, longer 
nucleotide sequences prepared by determining overlap of two smaller sequences. 

U.S. Patent No. 5,487,985 to McClelland, et al., incorporated by reference, teaches 
a method referred to as "AP-PCR" or arbitrarily primed polymerase chain reaction. The 



WO 01/51518 



PCT/US01/01275 



4 

method employs a single primer designed so that there is a degree of internal mismatch 
between the primer and the template. Following amplification with the primer, a second 
PCR is carried out The amplification products are separated on a gel to yield a so-called 
"fingerprint" of the organism or individual under study. The '985 patent does not discuss the 
identification of internal portions of open reading flames, nor does it discuss the analysis of 
sequences to develop contigs. 

The semaphorins are one of the most prominent of the conserved families of axon 
guidance molecules. As described by, e.g., Van Vactor et al., Curr Biol 25(9): R201- 
204(1999) they are expressed in many different regions of the developing nervous system, 
and are known to play important roles in establishing accurate axonal projections. 

The semaphorins have been divided into 7 different classes, depending upon whether 
they are secreted molecules or transmembrane molecules. For example, the membranes of 
the Class III semaphorin family are secreted molecules, known to act as repulsive factors for 
specific axonal populations. "S emaphorin m," which has also been referred to as "Collapsin- 
F or "Sema D," causes growth cone collapse, as well as axonal retraction and repulsion, in 
sensory and sympathetic axons in culture. See, e.g., Yu et al., Neuron22(l):l 1-14(1999). 

Eckhardt, et al. Mol. Cell Neurosci9(5/6):409-19 (1997) have identified a murine 
semaphorin cDNA, which is referred to as "SemaVTb." The molecule comprises a 
characteristic, extracellular semaphorin domain, but lacks both the immunoglobin domain 
and thrombospondin repeats that have been observed in other vertebrate, transmembrane 
semaphorins. Sema VTb is expressed in subregions of the nervous system during 
development, and is especially prominent in muscle tissue. Sema VTb mRNA is expressed 
ubiquitously in adulthood. Studies carried out in vitro have shown that this molecule binds 
the SH3 domain ofc-src. This may indicate arole in intracellular signaling via an src-related 
cascade. 

Christensen, at al. Cane. Res 58(6): 1238-44 (1988) identified a murine semaphorin/ 
collapsin family in metastatic murine mammary adenocarcinoma cell lines. The molecule 
was identified using differential display -PCR, and is now known as "M-Sema H." It is less 
abundant in normal tissues as compared to tumor cells. This work is significant because it 
is the first example of positive correlation of semaphorin expression with tumor progression, 
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and suggests a role for M-Sema H during metastasis and nerve axon development 
Christensen et al. have deposited three sequences in GENBANK, referred to as Z93947, 
Z93948, andZ80941. Also seePCT AppHcationW09947671 to Christensen etal., published 

on September 23, 1999. 

There is an extensive patent literature on members of the semaphorin family. See, 
e.g., US Patent No. 5,981,222 to Jacobs, et al.; 5,935,865 to Goodman, et al.; 5,807,826 to 
Goodman, et al.; 5,639,856 to Goodman, et al. International applications, in addition to 
W09947671, describedsam^ 

W09904263; W09902556; W09853065; W09822504; W09815628; W09811216; and 
W09720928. All of these U.S. and international applications, which represent some, but not 
all of the patent literature in this area, are incorporated by reference. 

Offurtoer Merest is me work of ta^ 
and Trusolino, et al, FASEB J 12(13):1267-80(1998). These references discuss so-called 
"scatter factors" and their receptors. Comoglio, et al, identify a new gene family of 
molecules which share homology to the scatter factor receptors, and identify these new 
molecules as semaphorin receptors, and suggest that deregulation of semaphorin may confer 
invasive and metastatic properties t cancer cells. 

The ORESTES methodology, described herein, has been applied to human breast 
tumor cells, and a complete sequence of a human semaphorin related molecule has been 
identified. Homology analysis reveals that it is 84% identical to murine semaphorin VIb, 
discussed supra. This is the molecule with which it shares greatest identity. Different forms 
of the molecule have been identified as well. 

Previously, a human molecule was identified and referred to as SemaphorinR. See, 
e.g., U.S. Patent application Serial No. 09/483,61 8, filed January 14, 2000, the disclosure of 
wfcchismcorporatedbyreferen^ 

Semaphorins/Collapsins," Cell 97:551-552 (May 28, 1999), the disclosure of which is 
incorporated by reference, suggest renaming the members of the semaphorin family and, in 
accordance with this system, Semaphorin R would now be referred to as Semaphorin 6B. 

How these molecules were identified, as well as their uses, will be clear from the 
disclosure which follows. 
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RRTFF D ESCRIPTION OF THE FIGURES 

Figures 1A and IB both show, schematically, prior art genome sequencing 
approaches. 

Figure 1C shows the invention, schematically. 

Figure 2 presents both a theoretical probability curve (dark ovals) and actual results 
(white ovals), obtained whenpracticing the invention. The datapoints refer to the probability 
of securing the sequence of a particular portion of cDNA molecule when practicing the 
invention. 

Figure 3 shows construction of a contig, using the invention. 

DF.TATT/FD DESCRIPTION OF PREFE RRED EMBODIMENTS 

One aspect of the invention, as discussed supra, is a method for obtaining nucleotide 
sequence information from organisms, preferably information from open reading frames of 
cDNA of eukaiyotic organisms. As a first step, messenger RNA ("mKNA") is extracted 
from a cell. The extraction of mRNA is a standard technique, the details of which are well 
known by the artisan of ordinary skill. For example, it is well known that eukaryotic mRNA, 
as compared to other forms of RNA, is characterized by a **poly A" tail. One can separate 
mRNA from other types of RNA by passing it over a column which contains oligomers of 
the base thymidine. These "oligo dT" molecules hybridize to the poly A sequences on the 
mRNA molecules, and these then remain on the column. Other approaches to separation of 
mRNA are known. All can be used. If prokaryotic mRNA is being considered, separation 
using poly A/poly T hybridization is not carried out It is preferred to treat the resulting 
material to reduce or to eliminate contamination by DNA Adding a DNA degrading 
enzyme, such as DNA ase is preferred. This is carried out prior to contact with the column. 
It is also preferred to pas the purified RNA over the column at least twice. 

The separated mRNA is then used to prepare a cDNA. The preparation of the cDNA 
represents the first inventive step in the method of the invention. To prepare the cDNA, the 
mRNA is combined with a sample of a single, arbitrary primer. By "arbitrary" is meant that 
the primer used does not have to be designed to correspond to any particular mRNA 
molecule. Indeed, it should not be, because the primer is going to be used to make all of the 
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cDNA. Details on the design of arbitrary primers can be found in Dias-Neto, et at, supra. 
McClelland, et al., supra, and Serial No. 08/907,129 filed August 6, 1997 and incorporated 
by reference. 

The primer is preferably at least 15 nucleotides long. Theoretically, it should not 
exceed about 50 nucleotides, but it can. Most preferably, the primer is 15-30 nucleotides 
long. While the sequence of the primer can be totally arbitrary, it is preferred that the total 
content of nucleotides "G" and "C" in the primer be compatible with the "G" and "C" content 
of the open reading frames of the organism under consideration. It is found that this favors 
amplification of the desired sequences. General rules of primer construction favor a G and 
C content of at least 50%. 

" Arbitrary primer" as used herein does not exclude specific design choices within the 
primers. For example, the four bases at the 3' end of a given primer are generally considered 
the most important portion for hybridization. Hence, it is desirable to include as many 
different primers as possible, to cover all variations within this 4 base sequence. There are 
256 variants possible, since there are four nucleotides. In order to identify products from a 
particular source, a "marker" sequence can be used, i.e., a stretch of predefined nucleotides. 
The remainder of the primer should be selected to correspond to overall GC usage, as 
described supra . Hence, for a primer 25 nucleotides long, the first 17 should correspond to 
GC usage for the organism in question. Nucleotides 18-21 would be a "tag", such as 
"GGCC" Then, all possible combinations of four nucleotides would follow, to produce 25 6 
primers, which contain a known marker. This procedure could be repeated with a second set 
of primers, where the marker at 18-21 is different. 

In practice, each set of variants is used with mRNA from a single source, and would 
permit the artisan to mark all sequences from a source, and still permit pooling. 

The primer is combined with the mRNA under low stringency conditions. What is 
meant by mis is that the conditions are selected so that the primer will hybridize to partially, 
rather than to only completely complementary sequences. Again, this is necessary because 
the primer will amplify an arbitrary sample of the mRNA pool, not just one sequence. There 
are standard rules and formulas for approximating high and low stringency, and the artisan of 
ordinary skill is familiar with these. Attention is drawn to Simpson, et al, U.S. Patent 
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application Serial No: 08/907,129, filed August 6, 1997, incorporated by reference, for more 
information on this, as well as Dias-Neto, et al. and McClelland, et aL, supra. 

The arbitrary primer and mKNA are mixed with appropriate reagents, such as reverse 
transcriptase, a buffer, and dNTPs, to yield a pool of single stranded, cDNA molecules. 

Once the single stranded cDNA is prepared, it is used in an amplification reaction. In 
this second reaction, it is preferred, but not required, that the single primer used is identical 
to the first primer, as described supra, and that low stringency conditions be employed. Using 
identical primers tends to produce longer products, but this is not required. 

The result of this amplification is a mini library. One can carry out cDNA synthesis 
inmultiple, separate reactions, using different arbitrary primers, "A", "B", "C" and "D". Four 
pools of single stranded cDNA are then produced, i.e, "A", "B", "C" and "D". Each pool is 
then amplified using each of the four primers, to generate mini-libraries AA, AB, AC, AD, 
BA, BB, BC, BD, CA, CB, CC, CD, DA, DB, DC, and DD. These mini-libraries are used in 
the sequencing reaction which follows. 

Once the cDNA is prepared, the resulting products are isolated, such as by size 
fractionation on a gel. The resulting bands can be removed from the gel, such as by elution, 
and then subjected to standard methodologies for cloning and sequencing. 

Key to this feature of the invention, as is described herein, is the use of arbitrary 
primers under low stringency conditions. This combination permits the artisan to sequence 
internal regions of cDNA preferentially, as compared to the 5* and 3' ends, as is typical in 
standard prior art approaches. Specifically, consider a portion of a cDNA molecule which is 
a distance "S" from the 3 f end of the molecule. For this portion of the molecule to be 
amplified by a primer, the primer must bind on both sides of the region to be amplified. If the 
complete length of the molecule is represented by "L", the probability of a primer binding to 
the nucleic acid molecule on both sides of a point on a nucleic acid molecule is S(L-S). 

The highest probability for inclusion within amplified cDNA is the exact middle of 
the molecule. Lowest priority, in contrast, is at the extreme 5' and 3 1 ends. To elaborate, 
assume a point directly in the middle of a cDNA molecule, i.e., if the molecule is "x + 1" 
nucleotides long, .5x nucleotides precede the midpoint, and .5x nucleotides follow it The 
likelihood of a primer hybridizing to a point on the molecule, preceding the middle is . 5x, and 
following it is also .5x. If "x" is 1, then the probability of hybridization surrounding the 
midpoint is .5(1 -.5), or .25, i.e., 25%. Similarly, assume apoint on the same molecule located 
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.9x away from the 3 1 end. In this case, since the molecule is "x" units long, the point is .lx 
from the 5* end, i.e., .1 units precede it, and .9 units follow it. If the length is 1, then the 
probability of hybridization surrounding this process is .9 (1-.9), or 9%. Hence, by using a 
primer and con<iitionswmchpermithybridi2ationofmeprm 

one actually secures the majority of amplified products from within a cDNA molecule, rather 
than at the ends. In figure 2 of this application, one sees a curve which results when the 
theoretical model is applied (dark ovals), and a curve obtained in practice (light ovals). Itwill 
be seen that, remarkably, the practice of the invention is actually very close to the theory. 

One very practical result of this approach is that the mKNA is normalized, and bias 
in copy number is eliminated. The probability of producing an EST from a given mKNA is 
proportional to the length of that molecule and not its abundance within the source being 
analyzed. 

A further aspect of the invention is the construction of contigs, once the sequence 
information has been determined. One creates a contig by comparing sequence information 
and finding overlaps. For example, the last 300 nucleotides of a sequence may be identical 
to the first 300 nucleotides of a second sequence. The artisan can essentially splice the first 
and second sequences together, to produce a longer one. The splicing can be done with two 
or more sequences found in me particular experiment that is carried out, or by comparing 
deduced sequences to sequences which are available in apublic database, aprivate database, 
a journal, or any other source of sequence information. 

A further aspect of the invention is the ability to compare information obtained using 
the inventive method to pre-existing information, in order to determine if a known nucleotide 
sequence is an internal sequence of aparticular gene. This can be done because, as explained 
supra, me method described herein generates an extremely high percentage of internal 
sequences, witha very lowpercentage of sequences atthe ends of a given molecule. Theprior 
art methods either generate predominantly terminal sequences, or internal sequences on a 
completely random basis. Hence, it is probable that nucleotide sequences of unknown origin 
are contained within various sources of sequence information. Data generated using the 
methods of this invention can be compared to this pre-existing information very easily, and 
can result in a determination that a particular nucleotide sequence is, in fact, an internal 
sequence. 
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. The practice of the invention and how it is achieved will be seen in the examples 
which follow. 

EXAMPLE 1 

This example describes the generation of a cDNA library in accordance with the 
invention. While colon cancer cells from a human were used, any cell could also be treated 
in the manner described herein. 

The mRNA was extracted from a sample of colon cancer cells, in accordance with 
standard methods well known to the artisan, and not repeated here. It was then divided into 
approximately Sf/Jl aliquots, which contained anywhere from 1 to lOng of mRNA. The 
samples were then stored at -70°C until used. 

The aliquots of mRNA were then used to prepare single stranded cDNA, using 25 
pmol samples of a single, arbitrary primer. Several different experiments were carried out, 
using a different, single arbitrary primer in each case. 

The single, arbitrary primers used were: 

5' - GAAGCTGGTA AACAAAAGG - 3' 

5' - AGCTGCATGA TGTGAGCAAG - 3' 

5' - CCCGCTCCTC CTGAGCACCC - 3' 

5' - GAGTCGATTT CAGGTTG - 3' 

5- . TGCTTAAGTT CAGCGGG - 3' 

In each case, 25 pmols of arbitrary primer were mixed with the aliquot of mRNA, 1 00 
units of Moloney murine leukemia virus reverse transcriptase, reverse transcriptase buffer 
(25mMTris-HCl, pH 8.3, 75mm KC1, 3mMMgCl 2 , lOmMDTT), and lOOmM of eachdNTP, 
to a final volume of 20uL. The mixture was incubated for 30 minutes, at 37 °C, to yield single 
stranded cDNA. 

EXAMPLE 2 

The single stranded cDNA produced in example 1, supra» was used as the template in 
a PCR amplification reaction. In this, a sample of lul of single stranded cDNA was 



SEQIDNO:8 
SEQIDNO:9 
SEQIDNO: 10 
SEQIDNO: 11 
SEQIDNO: 12 
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combined, together with the same primer that had been used to generate the cDNA. 
Amplification was carried out, using 12uM of primer, 200 uM of each dNTP, 1 .5mM MgCl 2 , 
1 unitofDNApolymerase, and buffer (50mMKCl, 10mMTris-HCl,pH9.0, and 0.1% Triton 
X-100), to reach a final volume of 15ul. Then, 35 cycles of amplification were carried out, 
1 cycle consisting of 95 °C for 1 minute, (denaturation), 37 °C for 1 minute (annealing), and 
extension at 72 °C, for 1 minute. In the final cycle extension was increased for 5 minutes. 
The amplification products were used in the analyses which follow. Additional experiments 
were also carried out, in the same fashion, using different primers. 

EXAMPLE 3 

In order to analyze the amplification products, 3ul samples were mixed with 3ul of 
sample buffer, 0.05% bromophenol blue, 0.05% xylene cyanol FF, and 7% sucrose (w/v), in 
distilled water, and then visualized on silver stained, 6% polyacrylamide gels, following 
Sanguinetti, et al., Biotechniques 17:3-6 (1994), incorporated by reference. 

The steps set forth supra result in banding patterns on the gel, each band representing 
a different sequence. The most complex banding patterns were analyzed, as discussed in 
example 4, infra. It is important to note that controls were run during the experiments, to 
make sure that genomic DNA had not contaminated the samples. In brief, the control 
experiments usedmRNA and genomic DNA, without reverse transcription PCR. The profiles 
obtained should differ, in each case from those obtained using reverse transcribed mRNA, and 
did so. 

EXAMPLE 4 

The cDNAs generated in the preceding examples were mixed, by pooling 10-20ul of 
each set of products into a final volume of 60ul, followed by electrophoresis through a 1 % low 
melting point agarose gel containing ethidium bromide to stain the cDNA fragments. Known 
DNA size standards were also provided. 

The gel portions containing fragments between 0.25 and 1.5 kilobases were excised, 
using a sterile razor blade. Excised agarose was then heated to 65°C for 10 minutes, in 1/10 
volume ofNaO Ac (3mM, pH 7.0), and cDNA was recovered via standard phenol/chloroform 
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extraction and ethanol precipitation, followed by resuspension in 40ul of water. The thus 
recovered cDNA was used in the following experiments. 

EXAMPLES 

The cDNA extracted supra was treated with 10 units of Klenow fragment cDNA 
polymerase, and 10 units of T4 polynucleotide kinase, for 45 minutes at 37°C. The reaction 
mixture was then extracted, once, with phenol, and the DNA was then recovered by passage 
through a standard Sephacryl S-200 column. Recovered cDNA was then ligated into the 
commercially available plasmid pUC18, and the plasmids were used to transform receptive 
E. coli, using standard methodologies. This resulted in sufficient amounts of individual 
cDNA molecules for the experiments which follow. 

EXAMPLE 6 

Individual bacterial clones were established from the transformants of example 5. 
These were then used to prepare sequencing templates, following standard methodologies and 
sequenced. Standard computational procedures, and publicly accessible databases were 
employed in analyzing the resulting sequences. There were some cases where the analysis 
revealed two, different cDNAs in the clone. This could be determined, since the primer 
sequence is present only at both ends of the cDNA. Thus, if the primer was found in the 
middle of the sequence, it indicated that the sequences on either side were from different 
cDNAs. The two sequences were treated as separate sequences in. analyzing the results. 

Of 413 cDNA sequences studied, 337 were not found in the public databases referred 
to, supra . Sixteen of these sequences had a partial match to known sequences, allowing a 
contig to be formed. 

There were another 42 sequences which were similar, but not identical to, sequences 
in public databases, suggesting that these 42 sequences are related to the pre-existing material. 

Twenty six of the sequences were completely contained within known, complete 
human sequences. This permitted generation of the empirical curve shown in Figure 2. 
Twenty two of the twenty six sequences were completely or partially within open reading 
frames of known genes. 
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Someofthe sequences obtamedshowedpartialhomology to known genes, suggesting 
their function. Other sequences were found which showed no homology to known sequences. 

TTV AMPLE 7 

This example shows the use of the invention as applied to breast cancer cells. 

A sample of an infiltrative breast carcinoma with attached portions of normal tissues 
was operatively resected from a subject The material was kept at -70°C until used. The 
sample was characterized, inter aha, by a large tumor mass and a very small amount of normal 

tissue. 

Three x 20 micron-thick slices were taken across the tumor mass and any attached 
normal tissue was microdissected out to leave "pure" tumor tissue. One slice was treated to 
remove ml^A, as described su^ 
8 & 9, as well as 

5' - AGGAGTGACG GTTGATCAGT - 3' SEQ ID NO: 13 

Reverse transcription was carried out as with the colon cancer sample, as described 
supra. Then, PCR amplification was carried out by combining 12.8uM of the same primer 
usedinthereversetranscriptionl25uMof eachdNTP, l.SmMMgCl* 1 unit of thermostable 
DNApolymerase, andbuffer (SOmMKCl, 10mMTris-HCl,pH9.0,and0.1%TritonX-100), 
to a final volume of 20ul. Amplification was carried out by executing 1 cycle (denaturation 
at94°C for 1 minute, annealing at 37 °C for 2minutes, and extension at 72°C, for 2 minutes), 
followedby 34 cycles at 94°C for 45 seconds, annealing at 55 °C for 1 minute and extension 
at 72 0 C for 5 minutes. When analyzed for banding, as described supra, the samples revealed 
a complex pattern. 

Theproducts were eluted from their gels, cloned intopUC-18, and the plasmids were 
transformed into E. coli strain DH5a, all as described supra. Plasmids were subjected to 
nunipreparation, using the known alkaline lysis melhod, and then about 150 of the molecules 
were sequenced. Of these, 69% were not found in any databank consulted, and appear to 
represent new sequences. A total of 22% was characterized by large quantities of repetitive 
elements and retroviral sequences. A total of 4% corresponded to known human sequences, 
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another 4% to ribosomal RNA and mitochondrial sequences, and 8% were redundant 
sequences. 

EXAMPLE 8 

An example of how a contig sequence can be built is described herein. 
With reference to figure 3, the darker portion is a sequence obtained in accordance 
with the invention. 

When the sequence was compared to sequences already accessible in databases, there 
was substantial overlap with a known sequence at the 3' end, and some overlap at the 5' end. 
This permitted construction of a 1 ,064 nucleotide long contig. The first sequence is a tentative 
human consensus sequence, as taught by Adams, et al., Nature 377: 3-17 (1995), while the 
third sequence is an EST obtained from human gall bladder cells, identified as human gall 
bladder EST 51121. 

EXAMPLE 9 

Following the experiments described supra, the sequences of tiae molecules identified 
therein were compared to sequences available to the public. One of these molecules, i.e., was 
found to be somewhat homologous to rat semaphorin Z (GENBANK Accession No: 
AB000776). The homology was found between nucleotides 780-897 of rat semaphorin Z and 
nucleotides 6-123 of SEQ ID NO: 1 . The homology is to the midpoint of the semaphorin 
Z gene, in me coding region. Recentworkby, e.g., Christensen, et al., Cane. Res 58:1238-44 
(1998), and Martin-Satue, et aL, J. Surg. Oncol 72:18-23 (1999), correlated expression of 
members of the semaphorin gene family with cancer. As such, work was undertaken to 
identify the full length sequence from which SEQ ID NO: 1 was derived. 

To do this, RACE-PCR, in accordance with Frohman, et al., Meth. En2ymol 218: 340- 
356 (1993), incorporated by reference, was carried out on a pool of human mammary gland 
cDNA To do this, one of the oligonucleotide primers: 

cttggagtcatgttt caegg 

(SEQ ID NO: 2) 
or 

gggatgctcttcacagctact 
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(SEQIDNO:3) 

which are 5* and 3' oriented gene specific primers were used, together with a primer that 
flanked either the 3' or 5' end of the cDNA molecule. More specifically, 5ngs of template 
were combined in a 25 (A PCR reaction, with 5 pmoles of each primer, 1.0 U rTth DNA 
polymerase, 800^M of each dr^, l.SmMMgQ^ and 7.5^/ of 3.3 XcommerdaUy available 
PCR buffer. Thirty-five cycles of PCR were carried out, one cycle being defined as 40 
seconds at 94°C, 60 seconds as 60°C, and 120 seconds as 72°C. Before me thirty-five cycles, 
however, one touchdown program was run (one cycle of annealing as 72°C, two cycles as 
68°C and 64°C, with identical denaturing and extension conditions). This resulted in an 
extension of the 5' end (i.e., nucleotides 321-887 of the final product, which is SEQ ID 

NO:6), but not the 3' end. 

In view of this, a second set of experiments were carried out, using primers: 

ccacgtggcatgcatggtcag 

(SEQ ID NO: 4) 
and 

gccatgcagaccccgcgagc 

(SEQ ID NO: 5) 

These primers were designed in view of other known ESTs which showed high 
similarity to semaphorin Z, but in regions other than the regions to which SEQ ID NO: 1 is 
similar. RT-PCR was carried out, and mis resulted in an 1801 nucleotide product. For the 
generation of this product, RT-PCR was carried out, as described supra, using an aliquot of 
5ng of cDNA from a pool of mRNA from three isolated tumor breast tissues. In additional 
PCR experiments, SEQ ID NO: 4 or 5 was used with SEQ ID NO: 2 or 3, and these 
experiments generated sequence products approximately 0.8 and 1.0 Kb long. 

Thefragmentproductswereseparatedby agarose gel electrophoresis, purified, cloned 
intopUC18,andsequencedusingstandardmethods. The resulting 1.8kb sequence is set forth 
as SEQ ID NO: 6 and represents the complete cDNA sequence . This sequence contains the 
entire 568 nucleotide fragment referred to supra (i.e., that obtained via RACE-PCR). The 
sequence exhibited 84% homology to murine semaphorin Z and semaphorin VIb which, 
together with presence of a sema domain, led to the conclusion that the molecule encodes a 
human semaphorin molecule. 
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The putative amino acid sequence of the protein encoded thereby is presented as SEQ 
ID NO: 7. A start codon is presumed to be at nucleotides 4-7 of SEQ ID NO: 6, and 
nucleotides 1555 represents the termination of the open reading frame. The open reading 
fiame is 1551 base pairs long, spans 13 exons (exons 1-12 and exon 17), and ends with a250 
base pair 3' untranslated region. 

Homology searches were then carried out, using standard methods. The sequence 
flanked by nucleotides 9 and 1394 of SEQ ID NO: 6 is 84% homologous to a sequence 
flanked by nucleotides 218 and 1606 of semaphorin Z, while the sequence flanked by 
nucleotides 1704 and 1801 of SEQ ID NO: 6 are 85% identical to nucleotides 3682 and 3779 
of semaphorin Z. Further, the amino acid 1-169 of SEQ ID NO: 7 is 87% homologous to 
amino acids 1-166 of semaphorin Z. The region defined by nucleotides 321-566 of SEQ ID 
NO:6 showed no homology to any other molecule in publicly accessible databases. Further, 
sequences defined by nucleotide fragments 72-566, 685-835, and 902-1410 were non- 
homologous with any ESTs associated with cancer. 



EXAMPLE 10 

This example describes further work on analysis of the semaphorin sequence described 
supra . The medium resolution Stanford G3 panel of DNA isolated from human/hamster 
hybrid cell lines, which is commercially available, was assayed, via PCR, using, as primers, 
SEQ ID NO: 3 and: 

aggtagttaa actccatcgc aatc 
(SEQ ID NO: 14). These primers were estimated to amplify a fragment of the semaphorin 
sequence described supra which was about 0.8 kb long, and contained an intronic sequence. 
The analysis indicated mat the semaphorin R/6B locus was linked to the STS SHGC-1476 
(lod_score= 5.6), located on chromosome 19, (distance of 36 cRs). The marker is not ordered 
on the map, but is linked to SHGC-3305 (lod_score = 3.2; distance of 60 cRs). 

EXAMPLE 11 

A BLAST search was then carried out, using the human sequence described herein, 
and the rat semaphorin Z sequence, against the HTGS database. This search indicated that 
there was a possible, alternatively spliced variant, with additional exons at the 3' end. 
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In order to evaluate this possibility, RT-PCR was carried out, using primers designed 
to amplify the potential, 3'- coding region that is absent mom the initial sequence. 

SEQ ID NO: 3 and 

gaggagtttg agacctaccg gc 
(SEQ ID NO: 15) were used. The results of the RT-PCR confirmed that there was, in fact, an 
alternate, longer human semaphorin sequence, the nucleotide sequence of which is set forth 
at SEQ ID NO: 16. The sequence contains an openreading frame 2061 basepairs long, which 
contains 4 additional exons, plus a cryptic acceptor site in the middle of the last exon, i.e., 
exon 17, in the 3' region of the mRNA sequence. This permits the RNA processing and 
posterior translation of the middle portion of the final exon. In contrast to me nucleotide 
sequence of SEQ ID NO: 6, the isoform encoded by this nucleotide sequence (687 amino 
acids), may contain a transmembrane domain, and a short cytoplasmic domain. The amino 
acid sequence is provided as SEQ ID NO: 17. Analysis using the "Pfam 5.2" database, as 
found at http://pfam.wustl.edu/, incorporated by reference, verifies this. 



FXAMPLE 12 

Previously, it had been observed that other semaphorins were involved in metastatic 
processes. See Christensen,etal., Cane. Res. 58(6): 1238-1244 (1998); Martin- Satue,etal, 
Surg. Oncol72(l): 18-23 (1999);Eckhardt,etal.,Mol.CellNeurosci9(5/6): 409-419 (1997). 
To study whether this was the case with the newly identified sequences, their expression in 
human glioblastoma cell lines, regulated by antitumor agents, was tested. The lines T98G 
and A172, which display in vitro invasive activity, were used. 

Glucocorticoid hormones or all-trans-retinoic acid (ATRA) were used to treat these 
cell lines for long periods of time, i.e., 24, 48 and 72 hours. (Glucocorticoid hormones are 
widely used as anti-inflammatory agents, and anti-tumoral agents, and are me only 
chemotherapeutic agents available for gliomas and glioblastomas). Retinoids are known to 
inhibit proliferation and migration of primary cultures of human multiform glioblastoma 
strongly, supporting clinical trials for their use. See Bouterfe, et al, Neurosurgery 46(2):419- 
430/2000)). TotalRNrA(10ug)sampleswereisolatedfromceUsthatwereuntreated received 
glucocorticoid treatement, or mat received 1 0" 5 M ATRA for the listed periods of time. Acid 
ribosomal phosphoprotein PO was used as an internal control for RNA loading. 
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The results indicated that glucocorticoid hormones did not regulate the expression of 
the sequence; however, the all-trans-retinoic acid did so, in a time dependent manner. 
Specifically, the expression was inhibited by 2. 5 fold after 24 hours, and 7 fold after 72 hours 
inT98G, and in A172, expression was inhibited by 2.5 and 10 fold, respectively, after 48 and 
72 hours. 

In the case of T98G, a 3.8 kb band was observed which appears to represent SEQ ID 
NO: 6, and is downregulated by ATRA. 

These results suggest that downregulation of the human semaphorin gene described 
herein underlies the anti-tumor action of all-trans-retinoic acid inhuman glioblastoma cells. 
This suggests a possible role for semaphorin gene product in tumor progression, which is 
consistentwith prior reports on other members ofthe semaphorin family, mcludmgM-SemaH 
and semaphorin E, in the progression of murine mammary and human metastatic lung 
adenocarcinomas. See Christensen, et al., supra. Martin-Same, et al., supra. 

FXAMPLE 13 

The expression of human semaphorin in various normal tissues was also tested via 
standard Northern blotting. A commercially available, human 12 lane multiple tissue 
Northern blot was used, where poly (A) + RNA was isolated, andhybridized to relevant human 
semaphorin sequences as described herein. GAPDH was used as an internal control. 

The results indicated that there was very strong expression in brain (a 4.5kb band), 
weak expression in heart, spleen, placenta and lung with no expression in skeletal muscle, 
colon, thymus, kidney, liver, or small intestine tissue, or leukocytes. 

The foregoing examples show that, using the ORESTES methodology described 
herein, an isolated nucleic acid molecule has been discovered which encodes for a molecule 
referred to herein as human semaphorin R/6B. "Human semaphorin R6B" as used herein 
refers to a protein encoded by, if nucleotides 4-1 555 of SEQ ID NO. 6, where nucleotides 4-7 
constitute a start codon, and nucleotides 1553-1555 a termination signal, as well as any 
protein which is encoded by the nucleic acid molecules of the invention, or is equivalent 
thereto such as proteins encoded by SEQ ID NO: 16. Also a part ofthe inventions are isolated 
nucleic acid molecules which encode this protein, such as the nucleotides which make up the 
2061 base pair ORF of SEQ. ID NO:6, as well as nucleic acid molecules which comprise the 
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nucleotide sequence set forth in SEQ ID NO: 1 or 6. Also a part of Hie invention are isolated 
nucleic acid molecules which comprise the nucleotide sequence definedby nucleotides 321- 
566 of SEQ ID NO: 6. More particularly, those nucleic acid molecules which comprise 
nucleotides 72-566ofSEQ ID NO:6 are part of the invention. Also a part ofthe invention are 
nucleic acid molecules that comprise nucleotides 72-835 or nucleotides 72-1410 of SEQ ID 
NO:6. Expression vectors and recombinant cells which comprise these nucleic acid 
molecules are also apart ofthe invention. "Expression vector" as used herein, refers to any 
vectorwhereinthenucleic acidmolecule is operably linked to a promoter. Recombinant cells 
in accordance with the invention are preferably eukaryotic cells, and may comprise the 
expression vector. With respect to the nucleic acid molecules, cDNA is preferred, but 
genomic DNA is also a part of the invention. 

The proteins which are a part of this invention may be admixed with, e.g. , 
pharmaceutical^ acceptable adjuvants, such as those which are well known to the skilled 
artisan. Such compositions canbe used, e.g., but not exclusively, to produce antibodies, such 
as monoclonal antibodies. These, as well as hybridomas producing them, are also a part of 
the invention. These proteins include e.g., those having amino acid sequences as set forth at 
SEQ ID NO: 7 or SEQ ID NO: 17, as well as those proteins homologous thereto. 

Expression of me nucleic acid molecules and proteins of the invention has been 
correlated to cancer, breast cancer in particular. Hence, yet another aspect ofthe invention 
is a diagnostic method for determining the possible presence of cancer, breast cancer or 
gUoblastomamparticmar.bydeternuning expression or presence ofone or both ofthe nucleic 
acid molecules and the proteins of the invention. One can carry out these assays via, e.g., 
DNA hybridization assays using SEQ ID NO: 2, 3, 4, 5, 14 or 15 antibody assays, and so 
forth. 

Other aspects ofthe invention will be clear to the skilled artisan and need not be set 
forth herein. 

The terms and expressions whichhave been employed are used as terms of description 
and not of limitation, and there is no intention in the use of such terms and expressions of 
excluding any equivalents ofthe features shown and described or portions thereof, it being 
recognized that various modifications are possible within the scope ofthe invention. 
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WECLAIM 

forth at SEQ ID NO:l. 

2 . raei sola te dnuc.eicacidmolecu.eofclai m l,eo n ^nuc le oSd«4-1555rf 
SEQ ID NO: 6. 

3. Th ei so lated n »cleioaMdmolec n l«ofclaim2,coniprismgSEQIDNO: 16. 

4. Th^lalednacldoaddmoleculeofclatal.wh.^.s.idmoleouleisoDKA. 

5 . ^^^oaddmoleculeofclaiml.wh^sddmoleouieisgmomic 
DNA. 

6 . Expression vector eoroprising the isolated nucleic aeid molecule of claim 1, 
operably linked to a promoter. 

7 . Recombinant ceU comprising the isolate4 nucleic acid molecule of claim 1. 

8 . The recombinant cellof claim 1, wherein said ceU is a eukaryotic cell. 

9. Recombinant cell comprising the expression vector of claim 6. 

10. The recombinant cell of claim 9, wherein said cell is a eukaryotic cell. 

U. isolatedprotein,^ 

sequence of SEQ ID NO: 7 or SEQ ID NO: 17. 

12. Compositioncomprisingmeisolatedprotemofcla^ 
acceptable adjuvant. 
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1 3 . Antibody which specifically binds to the isolated protein of claim 1 1 . 



14. The antibody of claim 13, wherein said antibody is a monoclonal antibody. 

15. Hybridoma cell line which produces the monoclonal antibody of claim 14. 

16. Method for determining possible presence of cancer in a sample comprising assay 
in a sample taken from a patient believed to have cancer for expression of the 
isolated nucleic acid molecule of claim 1 or a protein encoded by said isolated 
nucleic acid molecule, wherein expression of said nucleic acid molecule or said 
protein is indicative of possible presence of cancer. 

17. The method of claim 16, wherein said cancer is breast cancer or glioblastoma. 

18. The method of claim 16, comprising contacting said sample with apair of 
oligonucleotide primers, each of which is from 17 to 50 nucleotides in length, each 
of which is complementary to the isolated nucleic acid molecule of SEQ ID NO: 1, 
6 or 16. 

19. The method of claim 15, comprising contacting said sample with the 
oligonucleotides, set forth in SEQ ID NOS: 2 and 4, or SEQ ID NOS: 3 and 5. 

20. An isolated nucleic acid molecule which comprises nucleotides 321-566 of SEQ ID 
NO:6. 

21 . The isolated nucleic acid molecule of claim 20, comprising nucleotides 72-566 of 
SEQIDNO:6. 

22. The isolated nucleic acid molecule of claim 20, comprising nucleotides 72-835 of 
SEQIDNO:6. 
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23. The isolated nucleic acid molecule of claim 20, comprising nucleotides 72-1410 of 
SEQIDNO:6. 

24. An isolated nucleic acid molecule which hybridizes to SEQ ID NO: 1 , 6 or 1 6 and 
consists of at least 17 nucleotides and no more than 50 nucleotides. 

25. The isolated nucleic acid molecule of claim 24, selected from the group consisting 
of SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO: 14 or 
SEQ ID NO: 15. 



WO 01/51518 



PCT/US01/01275 




WO 01/51518 



2/3 



PCT/OS01/01275 




FIG. 2 



WO 01/51518 PCT/US01/01275 

3/3 

atttttaaataaaattgcccatcctcattcagctcttagaacaaaagcaaaaaaccctgt 
aaatcaggagatataagcacatctgcacccagaataggcccatatgatagggcaaccctg 
agcttaaacaatgacatcttcaagggtagaactaatctgaaacccccttccagcctctgg 
aagacactggcctgcatcagttagagtcagagcaagtgtcacttcacagggaaaagaagg 
attatatagacttcctatccctagagtttataaatgtcaactatataaaaaaagctcaaa 
acagtgttaaaggaatgaacagtaggaattttaataggctgtccaaagaagccaggtctg 

ctgtgggcaagtatagcctaaccctagtcttgt 

• IIIIMMIIIIMIMMIMM 

aatataqcctaaccctaqtcttataaa ataaqccaqaaaqqqttacttgaq 

ccacctttaaactaatacctatataafcaaacaaaaa atacaaaaatagatqcaataqtgt 
aataaqtctttaaacctacaaatcatq ccaccaqccataagttqacctatcacttqaqaa 
cGtcctcaacaaaaatqcnaaaaaacat tGaatcaaqttcrqcaaatqacacaqqqaqctq 
accctctqaccatcttcnct:aacaaacc t aaactagaacrcrqccatttqcaqcactqtcct 
qqaqctaatanactqtttcactqcctctq r .natataatqatqccaqcactaqccaqctqq 

tgggtatttg^aggaattcctgg^ 

cacacacct 

aacaaaqtqahaataatqtaaattaaaac c aatgaaagqattqaqtcaaaacttgqatct 

IlilllllllllllllllllllllllllllMllllllllMIMlllllllllllllll 

ggcaaagtgatgatgatgtgaattgtttccagtgaggggattgagtcaaaacttggntct 
raaatacctnaatttttccc-ecaatttct aactactactaaaaqccaqaaaqaacaqaac 

milium iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimi 

caggtacctcantttttcccccaatttctggctactactaaaagccagaaagaacagaac 

aqtqqcctcaqqaqatctaaq- tttqaatcca 
I | | | I I I I I I I I I III I I I I I I I I I M I II I 

agtggcctcaggagatctgaggtttgaatccttgctctctaggatgcaggtggcttgaagc 
agnatgccacaactncaagttgattagaactgccctttnttcccagggnttgacataggna 
ttaagtcaaaaattncatggaaacccagtgggtaaaaaagcct 
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<210> 1 
<211> 280 
<212> DNA 

<213> Homo sapiens 

<220> 

<400> 1 

catacgggat gctcttcaca gctactgtta ccgacttcct agccattgat gctgtcatct 60 
accgcagcct cggggacagg cccaccctgc gcaccgtgaa acatgactcc aagtggttca 120 
aaggtgagac caggaagggc agtgggccca gacctggcag ggcccagaac ctgacattca 180 
tcaaatctcc ccaccagaat aagagtccca gaggagcttg gtggagagat ttccaccaca 240 
tctttggaat ctgttggcct ttttgaagac tagaaagcta 280 



<210> 2 
<211> 20 
<212> DNA 

<213> Homo sapiens 

<220> 

<400> 2 

cttggagtca tgtttcacgg 20 



<210> 3 
<211> 21 
<212> DNA 

<213> Homo sapiens 

<220> 

<400> 3 

gggatgctct tcacagctac t 21 



<i210> 4 
<211> 21 
<212> DNA 

<213> Homo sapiens 



709660.1 



1 
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<220> 
<400> 4 

ccacgtggca tgcatggtca g 21 

<210> 5 

<211> 20 

<212> DNA 

<213> Homo sapiens 

<220> 

<400> 5 

gccatgcaga ccccgcgagc 20 



<210> 6 

<211> 1801 

<212> DNA 

<213> Homo 

<220> 

<400> 6 

gccatgcaga 

ctggggggcg 

gactacctga 

gaaggtgctg 

ggggacaggg 

taccagagga 

ggcaaacagg 

acgctctttg 

accctgcagc 

cacgccaatg 

ctagccattg 

aaacatgact 

catgtctact 

gtgtcccgcg 

aagcagtgga 

ttctacttca 

gtcctggccg 

gacctgacac 

tccatctgga 

gcccccggga 

aagacccacc 

cggaccctga 

ggcaaccaga 

cggcccaatg 

agggtgtgtg 

cgacgctggg 

ccccccactc 

agggcctgcc 

cggcgaaggt 

gccacccgtc 

g 



sapiens 



ccccgcgagc 
cccacggcct 
accactatcc 
acgacctcaa 
acaacctcta 
agctgacctg 
agggcgagtg 
tgtgcggttc 
ccgtcggaga 
ttgccctctt 
atgctgtcat 
ccaagtggtt 
tcttcttccg 
tggcccgagt 
cgtccttcct 
acgtgctgca 
ttttttccac 
aggtggcagc 
cgccggtgcc 
tgcagtacaa 
ctctgatgga 
tgaggcacca 
ccgttgtctt 
ccagcacctc 
tccacgagcg 
gcttccagaa 
tgcagaggga 
cggaagtcac 

gggtggggcc 
cccttgtgac 



gtcccctccc 
ctttcctgag 
cgtgtttgtg 
catccagcga 
ccgcgtagag 
gagatctaac 
tcgaaacttc 
caacgccttc 
caacatcagc 
ctctgacggg 
ctaccgcagc 
caaagagcct 
ggagattgcg 
gtgcaagaac 
gaaggcgcgg 
ggctgtcacg 
gcccagcaac 
tgtgtttgaa 
ggaggatcag 
tgcctccagc 
cgaagcggtg 
gctgactcga 
cctgggttct 
agggacgtct 
acgatcgtgg 
ggcccggggg 
agcggggaca 
atcggcagca 
cctctgtaaa 
ctcccccctc 



cgcccggccc 
gagccgccgc 
ggcagcgggc 
gtcctgcggg 
ttggagcccc 
cccagcgaca 
gtaaaggtgc 
aacccggtgt 
ggtatggccc 
atgctcttca 
ctcggggaca 
tactttgtcc 
atggagttta 
gacgtgggag 
ctcaactgct 
ggcgtggtca 
agcatccctg 
ggccgcttcc 
gtgcctcgac 
gccttgccgg 
ccctcgctgg 
gtggctgtgg 
gaggcgggga 
gggcgtgtgt 
tggccccagc 
cctccgaggt 
atgccggggt 
gctgtctaaa 
tacggcccca 
tgacctccag 



tcctgcttct 

cgcttagcgt 

ccggacgcct 

tcaacaggac 

ccacgtccac 

taaacgtgtg 

tgctccttcg 

gcgccaacta 

gctgcccgta 

cagctactgt 

ggcccaccct 

atgcggtgga 

actacctgga 

gctccccccg 

ctgtacccgg 

gcctcggggg 

gctcggctgt 

gagagcagaa 

cccggcccgg 

atgacatcct 

gccatgcgcc 

acgtgggagc 

cggtcctcaa 

gtcaagtggg 

ggcctgggcg 

gccggttagg 

ttcaggcagg 

gggcttgggg 

gggtggtgag 

ctgaccatgc 



gctgctgcta 
ggcccccagg 
gacccccgca 
gctgttcatt 
ggagctgcgg 
tcggatgaag 
ggacgagtcc 
cagcatagac 
cgaccccaag 
taccgacttc 
gcgcaccgtg 
gtggggcagc 
gaaggtggtg 
cgtgctggag 
agactcccat 
ccggcccgtg 
ctgcgccttt 
gtcccccgag 
gtgctgcgca 
caactttgtc 
ctggatcctg 
cggcccctgg 
gttcctcgtc 
ccacgcgtgc 
ttggctgagc 
agtttgaacc 
agacacgagg 
gcctgggggg 
agagtcccat 
atgccacgtg 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1801 



<210> 7 

<211> 517 

<212> PRT 

<213> Homo sapiens 

<220> 

<400> 7 

Met Gin Thr Pro Arg Ala Ser Pro Pro Arg Pro Ala Leu Leu Leu Leu 
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1 5 10 15 

Leu Leu Leu Leu Gly Gly Ala His Gly Leu Phe Pro Glu Glu Pro Pro 
20 25 30 

Pro Leu Ser Val Ala Pro Arg Asp Tyr Leu Asn His Tyr Pro Val Phe 
35 40 . 45 

Val Gly Ser Gly Pro Gly Arg Leu Thr Pro Ala Glu Gly Ala Asp Asp 
50 55 60 

Leu Asn lie Gin Arg Val Leu Arg Val Asn Arg Thr Leu Phe lie Gly 
65 70 75 80 

Asp Arg Asp Asn Leu Tyr Arg Val Glu Leu Glu Pro Pro Thr Ser Thr 
85 90 95 

Glu Leu Arg Tyr Gin Arg Lys Leu Thr Trp Arg Ser Asn Pro Ser Asp 
100 105 110 

lie Asn Val Cys Arg Met Lys Gly Lys Gin Glu Gly Glu Cys Arg Asn 
115 120 125 

Phe Val Lys Val Leu Leu Leu Arg Asp Glu Ser Thr Leu Phe Val Cys 
130 135 140 

Gly Ser Asn Ala Phe Asn Pro Val Cys Ala Asn Tyr Ser lie Asp Thr 
145 150 155 160 

Leu Gin Pro Val Gly Asp Asn lie Ser Gly Met Ala Arg Cys Pro Tyr 
165 170 175 

Asp Pro Lys His Ala Asn Val Ala Leu Phe Ser Asp Gly Met Leu Phe 
180 185 190 ■ 

Thr Ala Thr Val Thr Asp Phe Leu Ala lie Asp Ala Val lie Tyr Arg 
195 200 205 

Ser Leu Gly Asp Arg Pro Thr Leu Arg Thr Val Lys His Asp Ser Lys 
210 215 220 

Trp Phe Lys Glu Pro Tyr Phe Val His Ala Val Glu Trp Gly Ser His 
225 230 235 240 

Val Tyr Phe Phe Phe Arg Glu lie Ala Met Glu Phe Asn Tyr Leu Glu 
245 " 250 255 

Lys Val Val Val Ser Arg Val Ala Arg Val Cys Lys Asn Asp Val Gly 
260 265 270 

Gly Ser Pro Arg Val Leu Glu Lys Gin Trp Thr Ser Phe Leu Lys Ala 
275 280 285 

Arg Leu Asn Cys Ser Val Pro Gly Asp Ser His Phe Tyr Phe Asn Val 
290 295 300 

Leu Gin Ala Val Thr Gly Val Val Ser Leu Gly Gly Arg Pro Val Val 
305 310 315 320 
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Leu Ala Val Phe Ser Thr Pro Ser Asn Ser lie Pro Gly Ser Ala Val 
325 330 335 

Cys Ala Phe Asp Leu Thr Gin Val Ala. Ala Val Phe Glu Gly Arg Phe 
340 345 350 

Arg Glu Gin Lys Ser Pro Glu Ser lie Trp Thr Pro Val Pro Glu Asp 
355 360 365 

Gin Val Pro Arg Pro Arg Pro Gly Cys Cys Ala Ala Pro Gly Met Gin 
370 375 ~ ^ 380 

Tyr Asn Ala Ser Ser Ala Leu Pro Asp Asp lie Leu Asn Phe Val Lys 
385 390 395 400 

Thr His Pro Leu Met Asp Glu Ala Val Pro Ser Leu Gly His Ala Pro 
405 410 415 

Trp lie Leu Arg Thr Leu Met Arg His Gin Leu Thr Arg Val Ala Val 
420 425 430 

Asp Val Gly Ala Gly Pro Trp Gly Asn Gin Thr Val Val Phe Leu Gly 
435 440 445 

Ser Glu Ala Gly Thr Val Leu Lys Phe Leu Val Arg Pro Asn Ala Ser 
450 455 460 

Thr Ser Gly Thr Ser Gly Arg Val Cys Gin Val Gly His Ala Cys Arg 
465 470 475 480 

Val Cys Val His Glu Arg Arg Ser Trp Trp Pro Gin Arg Pro Gly Arg 
485 490 495 

Trp Leu Ser Arg Arg Trp Gly Phe Gin Lys Ala Arg Gly Pro Pro Arg 
500 505 510 

Cys Arg Leu Gly Val 
515 



<210> 8 

<211> 19 

<212> DNA 

<213> Homo sapiens 

<220> 

<400> 8 

gaagctggta aacaaaagg 19 



<210> 9 

<211> 20 

<212> DNA 

<213> Homo sapiens 

<220> 

<400> 9 

agctgcatga tgtgagcaag 20 



<210> 10 
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<211> 20 

<212> DNA 

<213> Homo sapiens 

<220> 

<400> 10 

cccgctcctc ctgagcaccc 20 

<210> 11 

<211> 17 

<212> DNA 

<213> Homo sapiens 

<220> 

<400> 11 

gagtcgattt caggttg 17 



<210> 12 

<211> 17 

<212> DNA 

<213> Homo sapiens 

<220> 

<400> 12 

tgcttaagtt cagcggg 17 



<210> 13 

<211> 20 

<212> DNA 

<213> Homo sapiens 

<220> 

<400> 13 

aggagtgacg gttgatcagt 20 



<210> 14 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<220> 

<400> 14 

aggtagttaa actccatcgc aatc 

<210> 15 

<211> 22 

<212> DNA 

<213> Homo sapiens 

<220> 

<400> 15 

gaggagtttg agacctaccg gc 

<210> 16 

<211> 2311 

<i212> DNA 

<213> Homo sapiens 

<220> 

<400> 16 

gccatgcaga ccccgcgagp gtcccctccc cgcccggccc tcctgcttct gctgctgcta 60 
ctggggggcg cccacggcct ctttcctgag gagccgccgc cgcttagcgt ggcccccagg 120 
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gactacctga accactatcc cgtgtttgtg 
gaaggtgctg acgacctcaa catccagcga 
ggggacaggg acaacctcta ccgcgtagag 
taccagagga agctgacctg gagatctaac 
ggcaaacagg agggcgagtg tcgaaacttc 
acgctctttg tgtgcggttc caacgccttc 
accctgcagc ccgtcggaga caacatcagc 
cacgccaatg ttgccctctt ctctgacggg 
ctagccattg atgctgtcat ctaccgcagc 
aaacatgact ccaagtggtt caaagagcct 
catgtctact tcttcttccg ggagattgcg 
gtgtcccgcg tggcccgagt gtgcaagaac 
aagcagtgga cgtccttcct gaaggcgcgg 
ttctacttca acgtgctgca ggctgtcacg 
gtcctggccg ttttttccac gcccagcaac 
gacctgacac aggtggcagc tgtgtttgaa 
tccatctgga cgccggtgcc ggaggatcag 
gcccccggga tgcagtacaa tgcctccagc 
aagacccacc ctctgatgga cgaagcggtg 
cggaccctga tgaggcacca gctgactcga 
ggcaaccaga ccgttgtctt cctgggttct 
cggcccaatg ccagcacctc agggacgtct 
acctaccggc cggacaggtg tgaacggccc 
gactcggggc tgctgagctt ggagctggac 
ccccgctgcg tggtccgagt gcctgtggct 
aactgtatcg gcagtcagga cccctactgc 
ctcagcccgg gcaccagagc cgcctttgag 
ttaggggact gcacaggact cctgcgggcc 
tcggtgaacc cgctggtaac gtcgtcggtg 
ggcttcagcg tgggctggtt cgtgggectc 

709660.1 



ggcagcgggc ccggacgcct gacccccgca 180 
gtcctgcggg tcaacaggac gctgttcatt 240 
ttggagcccc ccacgtccac ggagctgcgg 300 
cccagcgaca taaacgtgtg tcggatgaag 360 
gtaaaggtgc tgctccttcg ggacgagtcc 420 
aacccggtgt gcgccaacta cagcatagac 480 
ggtatggccc gctgcccgta cgaccccaag 540 
atgctcttca cagctactgt taccgacttc 600 
ctcggggaca ggcccaccct gcgcaccgtg 660 
tactttgtcc atgcggtgga gtggggcagc 720 
atggagttta actacctgga gaaggtggtg 780 
gacgtgggag gctccccccg cgtgctggag 84 0 
ctcaactgct ctgtacccgg agactcccat 900 
ggcgtggtca gcctcggggg ccggcccgtg 960 
agcatccctg gctcggctgt ctgcgccttt 1020 
ggccgcttcc gagagcagaa gtcccccgag 1080 
gtgcctcgac cccggcccgg gtgctgcgca 1140 
gccttgccgg atgacatcct caactttgtc 1200 
ccctcgctgg gccatgcgcc ctggatcctg 1260 
gtggctgtgg acgtgggagc cggcccctgg 1320 
gaggcgggga cggtcctcaa gttcctcgtc 1380 
gggctcagtg tcttcctgga ggagtttgag 1440 
ggcggtggcg agacagggca gcggctgctg 1500 
gcagcttcgg ggggcctgct ggctgccttc 1560 
cgctgccagc agtactcggg gtgtatgaag 1620 
gggtgggccc ccgacggctc ctgcatcttc 1680 
caggacgtgt ccggggccag cacctcaggc 174 0 
agcctctccg aggaccgcgc ggggctggtg 1800 
gcggccttcg tggtgagagc cgtggtgtcc 1860 
cgtgagcggc gggagctggc ccggcgcaag 1920 
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gacaaggagg ccatcctggc gcacggggcg ggcgaggcgg tgctgagcgt cagccgcctg 1980 

ggcgagcgca gggcgcaggg tcccgggagc cgacgctggg gcttccagaa ggcccggggg 2040 

tctccgaggt gccggttagg agtttgaacc ccccccactc tgcagaggga agcggggaca 2100 

atgccggggt ttcaggcagg agacacgagg agggcctgcc cggaagtcac atcggcagca 2160 

gctgtctaaa gggcttgggg gcctgggggg cggcgaaggt gggtggggcc cctctgtaaa 2220 

tacggcccca gggtggtgag agagtcccat gccacccgtc cccttgtgac ctcccccctc 2280 

tgacctccag ctgaccatgc atgccacgtg g 2311 

<210> 17 
<211> 687 
<212> PRT 

<213> Homo sapiens 

<220> 

<400> 17 

< 

Met Gin Thr Pro Arg Ala Ser Pro Pro Arg Pro Ala Leu Leu Leu Leu 
15 10 15 

Leu Leu Leu Leu Gly Gly Ala His Gly Leu Phe Pro Glu Glu Pro Pro 
20 25 30 

Pro Leu Ser Val Ala Pro Arg Asp Tyr Leu Asn His Tyr Pro Val Phe 
35 40 45 

Val Gly Ser Gly Pro Gly Arg Leu Thr Pro Ala Glu Gly Ala Asp Asp 
50 55 60 

Leu Asn lie Gin Arg Val Leu Arg Val Asn Arg Thr Leu Phe He Gly 
65 70 75 80 

Asp Arg Asp Asn Leu Tyr Arg Val Glu Leu Glu Pro Pro Thr Ser Thr 
85 90 95 

Glu Leu Arg Tyr Gin Arg Lys Leu Thr Trp Arg Ser Asn Pro Ser Asp 
100 105 110 

He Asn Val Cys Arg Met Lys Gly Lys Gin Glu Gly Glu Cys Arg Asn 
115 12 0 125 

Phe Val Lys Val Leu Leu Leu Arg Asp Glu Ser Thr Leu Phe Val Cys 
130 135 140 

Gly Ser Asn Ala Phe Asn Pro Val Cys Ala Asn Tyr Ser He Asp Thr 
145 150 155 160 

Leu Gin Pro Val Gly Asp Asn He Ser Gly Met Ala Arg Cys Pro Tyr 
165 170 175 
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Asp Pro Lys His Ala Asn Val Ala Leu Phe Ser Asp Gly Met Leu Phe 
180 185 190 

Thr Ala Thr Val Thr Asp Phe Leu Ala He Asp Ala Val He Tyr Arg 
195 200 205 

Ser Leu Gly Asp Arg Pro Thr Leu Arg Thr Val Lys His Asp Ser Lys 
210 215 220 

Trp Phe Lys Glu Pro Tyr Phe Val His Ala Val Glu Trp Gly Ser His 
225 230 235 240 

Val Tyr Phe Phe Phe Arg Glu He Ala Met Glu Phe Asn Tyr Leu Glu 
245 250 255 

Lys Val Val Val Ser Arg Val Ala Arg Val Cys Lys Asn Asp Val Gly 
260 265 270 

Gly Ser Pro Arg Val Leu Glu Lys Gin Trp Thr Ser Phe Leu Lys Ala 
275 280 285 

Arg Leu Asn Cys Ser Val Pro Gly Asp Ser His Phe Tyr Phe Asn Val 
290 295 300 

Leu Gin Ala Val Thr Gly Val Val Ser Leu Gly Gly Arg Pro Val Val 
305 310 315 320 

Leu Ala Val Phe Ser Thr Pro Ser Asn Ser He Pro Gly Ser Ala Val 
325 330 335 

Cys Ala Phe Asp Leu Thr Gin Val Ala Ala Val Phe Glu Gly Arg Phe 
340 345 350 

Arg Glu Gin Lys Ser Pro Glu Ser He Trp Thr Pro Val Pro Glu Asp 
355 360 365 

Gin Val Pro Arg Pro Arg Pro Gly Cys Cys Ala Ala Pro Gly Met Gin 
370 375 380 

Tyr Asn Ala Ser Ser Ala Leu Pro Asp Asp He Leu Asn Phe Val Lys 
385 390 395 400 

Thr His Pro Leu Met Asp Glu Ala Val Pro Ser Leu Gly His Ala Pro 
405 410 415 

Trp He Leu Arg Thr Leu Met Arg His Gin Leu Thr Arg Val Ala Val 
420 425 430 

Asp Val Gly Ala Gly Pro Trp Gly Asn Gin Thr Val Val Phe Leu Gly 
435 440 445 

Ser Glu Ala Gly Thr Val Leu Lys Phe Leu Val Arg Pro Asn Ala Ser 
450 455 460 
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Thr Ser Gly Thr Ser Gly Leu Ser Val Phe Leu Glu Glu Phe Glu Thr 
465 470 475 480 

Tyr Arg Pro Asp Arg Cys Glu Arg Pro Gly Gly Gly Glu Thr Gly Gin 
485 490 495 

Arg Leu Leu Asp Ser Gly Leu Leu Ser Leu Glu Leu Asn Ala Ala 
500 505 510 

Ser Gly Gly Leu Leu Ala Ala Phe Pro Arg Cys Val Val Arg Val Phe 
515 520 525 



Val Ala Arg Cys Gin Gin Tyr Ser Cys Cys Met Lys Asn Cys lie Gly 
530 535 540 



Ser Gin Asp Pro Tyr Cys Gly Trp Ala Pro Asp Gly Ser Cys He Phe 
545 550 555 560 



Leu Ser Pro Gly Thr Arg Ala Ala Phe Glu Gin Asp Val Ser Gly Ser 
565 570 575 



Thr Ser Gly Leu Gly Asp Cys Thr Gly Leu Leu Arg Ala Ser Leu Ser 
580 585 590 

Glu Asp Arg Ala Gly Leu Val Ser Val Asn Pro Leu Val Thr Ser Ser 
595 600 605 

' j[ m/ j t'- , - ,r , 

Val Ala Ala Phe Val Val Arg Ala Val Vai^Ser* Gly. Phe ?Ser Val Gly 
610 615 620 

Trp Phe Val Gly Leu Arg Glu Arg Arg Glu Leu Ala Arg Arg Lys Asp 
625 630 635 640 

Lys Glu Ala He Leu Ala His Gly Ala Gly Glu Ala Val Leu Ser Val 
645 650 655 

Ser Arg Leu Gly Glu Arg Arg Ala Gin Gly Pro Gly Ser Arg Arg Trp 
660 665 670 



Gly Phe Gin Lys Ala Arg Gly Ser Pro Arg Cys Arg Leu Gly Val 
675 680 685 
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